Main

Episodic memory allows us to retrieve specific experiences from our past, mentally reactivating co-occurring perceptual details and the broader sequence of events in which they were embedded1. The law of forgetting, or memory transience, dictates that these memories fade with time2,3,4. Yet memory does not decline simply as a function of clock time and calendar time5. Forgetting depends on, among other factors, neurocognitive states intervening between encoding and retrieval—especially sleep. A century of experimental evidence shows that sleeping after an event results in better retention than matched periods of wakefulness6,7,8. It remains unclear, however, whether sleep’s effects on episodic memory derive from passively sheltering it from interference7,9,10 or from actively transforming memories into more durable or adaptive formats8,11,12,13.

Active systems consolidation theories are rooted in the observation of sleep-specific neural dynamics that are linked to the preservation of newly encoded experiences8,14,15,16. Initially labile episodic memory traces are thought to be stabilized via repeated and time-compressed hippocampal-to-cortical replay during sleep17,18, offering a mechanism by which one-time experiences can stick in memory. In rodents, sleep-related replay occurs during hippocampal sharp-wave ripples, which in turn are hierarchically nested in sleep spindles (brief 11–16 Hz bursts) and slow waves (0.5–4 Hz oscillations) during slow-wave sleep (SWS)14,19. Evidence in humans is consistent with this proposed mechanism (for example, refs. 20,21,22)—episodic memory performance varies with SWS duration and microstructure (for example, ref. 23; for reviews, see refs. 8,24), particularly temporal or phase-based spindle–slow wave coupling, which facilitates sleep-related memory reactivation25 and replay-linked synaptic plasticity19,26.

However, the evidence for sleep’s active role in shaping episodic memory is mixed, as failed replications and task dependence make it unclear which retrieval tasks or processes should be most affected27,28. Moreover, the presumption that sleep-related consolidation transforms memories from specific (that is, episodic) to abstracted, generalized knowledge via hippocampal-to-cortical information transfer8,19,29 does not explain the role of sleep (if any) in stabilizing or enhancing episodic memory for specific experiences. Relatedly, if sleep adaptively transforms rather than passively preserves episodic memory, then some elements of memory should improve in absolute terms from pre- to post-sleep, beyond a mere reduction in forgetting relative to wakefulness. However, there is little or no evidence for the predicted sleep-induced above-baseline enhancement of episodic memory, perhaps owing to insensitive behavioural measurement. Sleep may in some cases improve non-episodic memory tasks, such as motor sequence learning or statistical learning30,31 (but see refs. 32,33). However, in studies of episodic memory, sleep at best reduces the amount of forgetting relative to wakefulness24 (but see ref. 34).

These discrepancies may be attributable in part to the differential forgetting of heterogeneous episodic representations over time35,36,37,38 and perhaps during sleep. One idea is that sleep serves as a ‘memory triage’ system, tagging only the most relevant pieces of information to persist over time24,39, such as information that is emotional13,40,41,42, congruent with prior knowledge43 or personally relevant for future use (for example, refs. 44,45). Yet gaps remain in our understanding of how sleep differentially impacts the two defining types of associations underlying episodic memories, which are linked to distinct hippocampal–cortical networks: (1) associations linking simultaneously encoded event features (for example, hearing your favourite song played in a concert), and (2) sequential associations bridging temporal gaps between successive events (for example, the songs played before and after your favourite song)1,46,47,48.

The mechanics of sleep-related replay suggest that it should preferentially enhance sequential organization in memory representations—associations bridging experiences across time and space—as opposed to simultaneously encoded features. In rodents, neural replay recapitulates the spatio-temporal order of waking experience and compresses behavioural-timescale gaps between encoded events (that is, seconds or longer) down to the millisecond-level timescale of synaptic plasticity, supporting subsequent post-sleep navigation17,49. Accordingly, in humans, sleep benefits the detection of temporal rules or regularities more than other kinds of associative relations across studies50, and sleep has also been found to reduce forgetting of temporal order information relative to wakefulness51,52. Yet sleep’s particular influence on the retrieval of sequential versus atemporal episodic associations making up a single, continuous one-shot experience remains unknown. Furthermore, little is known about sleep’s influence on episodic memory for real-world events experienced outside of the laboratory—which is more durable than memory for laboratory stimuli53—beyond delays of a few hours or a day54,55, despite evidence that memory consolidation or transformation unfolds over weeks to months or longer5,56,57.

We examined (1) how memory for sequential versus featural associations from a single immersive, real-world event transforms with sleep and over longer timescales (study 1); (2) whether sleep per se is necessary for differential sequence versus featural memory transformation (study 2a); and (3) whether sleep-related memory transformation is related to the duration and predicted neural mechanisms of SWS (study 2b). We devised a new test of episodic memory for a controlled but immersive real-world event—the Baycrest Tour46,53, an audio-guided walking tour of artwork—paralleling the goal-directed and ecologically scaled paradigms used in rodent research (Fig. 1). Trial-unique and difficulty-matched memory tests probed both the sequence structure (for example, “You encountered the bicycle sculpture before the scale model of Baycrest”) and static perceptual features (for example, “The bicycle sculpture was made of metal”) of the tour items before and after a night of sleep, and at subsequent delays from days to over a year after the tour. We hypothesized that sleep transforms episodic memory across short- and long-term timescales by preferentially enhancing sequential structure in memory over atemporal perceptual associations, and that this enhancement would be related to SWS duration and its physiological hallmarks: spindles, slow waves and particularly spindle–slow wave coupling.

Fig. 1: Overview of study design.
figure 1

a, Depiction of the beginning of the audio-guided tour route (dashed line), including the locations and examples of the first 13 target events. b, Each memory test included trial-unique, difficulty-matched true/false questions probing sequence (for example, “You encountered the piece called ‘Rotation’ before the shuffleboard game”; blue text) and featural (for example, “The piece called ‘Rotation’ is made of wood”; orange text) information from the tour. Some items have been altered from the original due to copyright restrictions. c, Study design: four unique true/false recognition memory tests were administered at four time points (T1–T4), with test order counterbalanced across participants. In study 1 (N = 53), the tour was completed during regular business hours, with T1 occurring at 1 h and T2 occurring at 24 h post-encoding. Study 2a participants underwent identical encoding conditions but were randomized to sleep (N = 39) or wake (N = 38) groups, with T1 occurring at 1 h post-encoding and T2 occurring after 12 h in both groups. Critically, participants either slept overnight in the sleep laboratory (sleep group) or stayed awake (wake group) during this 12 h delay. T3 and T4 occurred at 1 week and 1 month post-encoding for both study 1 and study 2a. A total of 49 study 2a participants (N = 26 sleep, N = 23 wake) completed a post hoc 5th test (T5) 15 months post-encoding. Study 2b (not depicted) comprised the study 2a sleep group plus 10 additional participants who underwent the same procedures as the sleep group (total N = 49) to increase power for individual differences analyses linking sleep neurophysiology to memory change. See Supplementary Methods (p. 2) for participant demographics and exclusion criteria. Credit: a (clockwise from upper right), Shofarot (Lucien Krief), Autumn Song (Judy Singer), Rotation (Ina Gilbert); reproductions of artwork used with permission.

In study 1, we found differential transformation of sequence versus featural memory from 1 h to 1 month: sequence memory significantly increased above baseline over a 24-h delay, including a night of sleep, and remained stable up to a month later, whereas featural memory declined monotonically at the classic Ebbinghausian forgetting rate (linearly on an approximately logarithmic timescale)3,4. In study 2a, we replicated study 1 and showed that sleep (contrasted with a matched period of wakefulness in a between-subjects design) was necessary for the boost in sequence memory, also finding that the sleep-related enhancement of sequence memory held at a 15-month follow-up. In study 2b, using overnight polysomnography (PSG) recording, we found that individual differences in SWS duration and neurophysiology—particularly spindles, slow waves and their coupling—were positively associated with enhanced memory for this real-life event.

Results

Sleep-related enhancement of sequence versus featural memory

As expected, given the law of forgetting, overall memory accuracy declined from T1 to T4 (χ2(1) = 68.54, P < 0.001; Fig. 2a) in study 1, but this effect was qualified by a crossover interaction between time and retrieval type (sequence, featural) (χ2(1) = 53.77, P < 0.001). The significant advantage for featural memory over sequence memory observed just after encoding (T1, z = −3.26, P = 0.001, Cohen's d = −0.46, 95% confidence interval (CI) = [−0.74, −0.17]) flipped in direction after a night’s sleep (T2, z = 0.27, P = 0.789, d = 0.12, 95% CI = [−0.15, 0.39]), with a significant advantage for sequence over featural memory at 1 week (T3, z = 2.26, P = 0.024, d = 0.43, 95% CI = [0.14, 0.71]) that widened over time (T4, z = 3.65, P < 0.001, d = 0.84, 95% CI = [0.52, 1.15]). Sequence and featural memory accuracy did not differ overall (χ2(1) = 0.69, P = 0.405). Considering the overnight trajectories of sequence versus featural memory, sequence memory increased significantly overnight (z = 2.65, P = 0.008, d = 0.33, 95% CI = [0.05, 0.61]), whereas featural memory significantly declined (z = −2.38, P = 0.018, d = −0.29, 95% CI = [−0.57, −0.02]) (Fig. 2b).

Fig. 2: Transformation of sequence versus featural memory over time.
figure 2

Sequence (blue) versus featural (orange) memory accuracy across T1–T4, approximately logarithmically spaced across time to fit a negatively accelerated forgetting function. Dots depict means, vertical lines and shaded regions depict between-subjects standard errors. a, Study 1 (N = 53). We observed a significant advantage for sequence over featural memory at 1 week (T3, z = 2.26, P = 0.024, d = 0.43, 95% CI = [0.14, 0.71]) that widened over time (T4, z = 3.65, P < 0.001, d = 0.84, 95% CI = [0.52, 1.15]; two-sided uncorrected tests on estimated marginal means from a generalized linear mixed effects model). b, Accuracy change from T1 to T2 in study 1. Thin lines depict individual study 1 participants, and circles and thick lines depict group averages. Sequence change and featural change were not significantly correlated (r(51) = 0.002, P = 0.986, 95% CI = [−0.27, 0.27]). c, Study 2a sleep group (N = 39) who slept in the sleep laboratory between T1 and T2. Sequence memory performance exceeded featural memory performance after 1 month (T4, z = 2.11, P = 0.035, d = 0.47, 95% CI = [0.13, 0.80]) but not earlier (T3, z = 1.22, P = 0.221, d = 0.28, 95% CI = [−0.04, 0.60]; T2, z = 0.11, P = 0.915, d = 0.05, 95% CI = [−0.26, 0.37]; T1, z = −1.66, P = 0.097, d = −0.25, 95% CI = [−0.57, 0.07]; two-sided uncorrected tests on estimated marginal means from a generalized linear model). d, Study 2a wake group (N = 38) who did not sleep between T1 and T2. Sequence and featural memory did not differ at any time points (T4, z = 1.37, P = 0.170, d = 0.30, 95% CI = [0.03, 0.63]; T3, z = 0.46, P = 0.645, d = 0.16, 95% CI = [−0.17, 0.48]; T2, z = −0.92, P = 0.356, d = −0.08, 95% CI = [−0.41, 0.23]; T1, z = −1.47, P = 0.143, d = −0.18, 95% CI = [−0.51, 0.15]).

The improvement of sequence versus featural memory over the 24-h interval from T1 to T2 could not be explained by item characteristics such as difficulty or true/false phrasing (Supplementary Fig. 1). Sequence memory improved as the inter-item lag (that is, spatio-temporal distance) between the tour events in each pair increased in both study 1 and study 2 (Supplementary Fig. 2), replicating previous sequence memory assessments58,59, and this sequence lag effect did not vary with time—even fine-grained sequence memory accuracy (for item pairs that were adjacent or had only one intervening item) was retained from an hour (T1) to a month (T4) after encoding.

Study 2a participants were randomly assigned to either a wake or sleep condition (Fig. 1). Here we first replicated the findings observed in study 1 in the sleep condition: a main effect of time (T1–T4) (χ2(1) = 60.40, P < 0.001) was once again qualified by a time × retrieval type crossover interaction (χ2(1) = 13.85, P < 0.001) (Fig. 2c). Sequence memory performance exceeded featural memory performance after 1 month (T4, z = 2.11, P = 0.035, d = 0.47, 95% CI = [0.13, 0.80]), but not earlier (T3, z = 1.22, P = 0.221, d = 0.28, 95% CI = [−0.04, 0.60]; T2, z = 0.11, P = 0.915, d = 0.05, 95% CI = [−0.26, 0.37]; T1, z = −1.66, P = 0.097, d = −0.25, 95% CI = [−0.57, 0.07]; see Fig. 2c). As in study 1, sequence memory significantly increased overnight (z = 2.76, P = 0.006, d = 0.47, 95% CI = [0.13, 0.80]), but featural memory did not (z = 0.43, P = 0.670, d = 0.04, 95% CI = [−0.28, 0.36]; see Supplementary Fig. 3 for participant-level data).

Considering the wake group, sequence and featural memory did not differ at any time points (T4, z = 1.37, P = 0.170, d = 0.30, 95% CI = [0.03, 0.63]; T3, z = 0.46, P = 0.645, d = 0.16, 95% CI = [−0.17, 0.48]; T2, z = −0.92, P = 0.356, d = −0.08, 95% CI = [−0.41, 0.23]; T1, z = −1.47, P = 0.143, d = −0.18, 95% CI = [−0.51, 0.15]), nor was there a T1-to-T2 increase in sequence memory as seen in the study 1 and study 2a sleep group; indeed, sequence memory declined from T1 to T2, albeit non-significantly (z = −0.45, P = 0.655, d = −0.06, 95% CI = [−0.38, 0.26]). When the sleep and wake groups were analysed in the same model (T1–T2), there was a significant group × time interaction (χ2(1) = 5.86, P = 0.016), such that overall memory accuracy was greater after a period of sleep (z = 2.29, P = 0.022, d = 0.315, 95% CI = [0.00, 0.63]) but not after being awake (z = −1.13, P = 0.258, d = −0.213, 95% CI = [−0.54, 0.11]), although the group × time × retrieval type interaction was not significant (χ2(1) = 0.34, P = 0.558). Although time-of-day effects cannot be ruled out, such effects could not account for the differences between the sleep and wake groups, given that their memory performance was nearly identical between groups at T1 (see Fig. 2). In addition, there were no group differences in chronotype or self-reported sleep habits (Supplementary Methods, p. 2). A model examining all 4 time points revealed the expected main effect of time (χ2(1) = 78.62, P < 0.001) and the time × retrieval type interaction (χ2(1) = 9.16, P = 0.002), but no significant interactions with group (group × time, χ2(1) = 0.39, P = 0.533; 3-way interaction, χ2(1) = 0.34, P = 0.560).

Long endurance of sleep-related mnemonic advantage

Having found relatively strong sequence memory retention at the longest delay in our original design (1 month), we created a 5th memory test (T5; matching the properties of the tests in T1–T4 and drawing test items from them equally; Methods), delivered to study 2a participants (26 and 23 from the sleep and wake groups, respectively) after approximately 15 months. This interval roughly approximated the next proportionately spaced time point, given a negatively accelerated forgetting function across time3,4. As expected, overall memory performance declined over time (χ2(1) = 169.73, P < 0.001), but it remained above chance at T5 (mean (M)= 0.63, s.d. = 0.03) (Fig. 3). As reported for the study 2a sleep group, a sleep-related advantage for sequence over featural memory performance from T1 to T4 was evident among those who completed T5 (Supplementary Methods, p. 3). This mnemonic enhancement conferred by 1 night of sleep (versus wake) held at 15 months post-encoding (group × time [T1, T5] interaction; χ2(1) = 5.57, P = 0.018). Although the 3-way group × time × retrieval type interaction was not significant (χ2(1) = 0.67, P = 0.411), hypothesis-driven planned comparisons confirmed sequence memory performance from T1 to T5 was preserved by sleep relative to wake (that is, the T1–T5 decline in sequence memory was greater in the wake versus sleep group; z = 2.24, P = 0.025, d = 0.64, 95% CI = [0.05, 1.22]). There were no group differences in featural memory performance from T1 to T5 (z = 1.09, P = 0.274, d = 0.43, 95% CI = [−0.15, 1.10]). Although the study 2a long-term (T5) follow-up participants were composed of a subset of the full sample, they were indistinguishable from the main sample based on either demographics or memory performance (Supplementary Methods, p. 3) and there was no a priori reason to expect that these individuals would show a disproportionate advantage for sequence over featural memory.

Fig. 3: Long-term transformation of sequence versus featural memory.
figure 3

Sequence versus featural memory accuracy from T1 to T5 in a subset of the study 2a participants who completed the ancillary T5 testing (N = 26 and 23 for sleep and wake groups, respectively). The sleep-related advantage for sequence over featural memory performance held at 15 months post-encoding. Sequence memory performance from T1 to T5 was preserved by sleep relative to wake (that is, the T1–T5 decline in sequence memory was greater in the wake versus sleep group; z = 2.24, P = 0.025, d = 0.64, 95% CI = [0.05, 1.22]). There were no group differences in the T1–T5 change in featural memory performance (z = 1.09, P = 0.274, d = 0.43, 95% CI = [−0.15, 1.10]). Dots depict means, and vertical lines and shaded regions depict between-subjects standard errors.

Overnight memory transformation is linked to neurophysiology

Study 2b participants, including the sleep group from study 2a and additional participants recruited for individual differences analyses, underwent overnight PSG recordings in the sleep laboratory between T1 and T2 (and testing at T3 and T4; N = 49 after exclusions; Supplementary Methods, p. 3). As in study 1 and study 2a, the time × retrieval type interaction, with memory for sequence preserved relative to features from T1 to T4, was significant (χ2(1) = 23.41, P < 0.001), with sequence memory significantly improving overnight (z = 2.20, P = 0.026, d = 0.36, 95% CI = [0.07, 0.66]). As in study 2a, featural memory remained flat overnight (z = 0.09, P = 0.927, d = 0.03, 95% CI = [−0.25, 0.31]). When modelling sleep macrostructure, SWS duration was positively and selectively associated with overall memory enhancement (SWS duration × time [T1, T2] interaction, F(1, 131) = 4.51, P = 0.036, partial η2 = 0.03; see Supplementary Fig. 4 and Supplementary Table 3 for null effects of other sleep stages and null interactions with retrieval type). This effect extended to SWS microstructure: both slow-wave (half wave) and spindle quantity during SWS were associated with overall memory enhancement, with both interacting with time (F(1, 141) = 7.43 and 8.66, P = 0.007 and 0.004, partial η2 = 0.05 and 0.06, respectively; see Supplementary Table 4 and Supplementary Fig. 4). Critically, spindles coupled to slow waves—but not uncoupled spindles—were associated with overall memory enhancement (interactions with time, for coupled and uncoupled spindles, respectively, F(1, 138) = 9.32, P = 0.003, partial η2= 0.06; and F(1, 138) = 0.070, P = 0.792, partial η2 ≈ 0.00; Supplementary Table 5 and Supplementary Fig. 5). An alternative method for identifying spindle–slow wave coupling, which flexibly identified spindles occurring during the upstate versus downstate of individual slow waves, replicated this finding and highlighted the particular importance of coupling in the slow-wave upstate (Supplementary Results, p. 15, and Supplementary Fig. 6).

Given our previous results concerning the behavioural dissociation of sequence and featural memory change overnight, these measures are visualized separately in relation to coupled and uncoupled spindles (Fig. 4). Overnight sequence memory enhancement was uniquely associated with coupled spindles, significantly more so than uncoupled spindles. Featural memory showed a similar pattern, although the correlations between coupled versus uncoupled spindles were not significantly different (Fig. 4).

Fig. 4: Relationship between spindle–slow wave coupling and overnight memory change.
figure 4

Study 2b Pearson correlations between sleep neurophysiology and memory performance (N = 49). The line reflects the best fit with 95% CIs shown. a, Sequence memory change (T2 − T1) was significantly associated with coupled (r(47) = 0.42, P = 0.003, 95% CI = [0.156, 0.626]; uncorrected, two-sided significance test) but not uncoupled (r(47) = 0.01, P = 0.921, 95% CI = [−0.268, 0.295]) spindles in SWS. These correlations are statistically different from each other (Steiger’s z = 2.24, P = 0.029). b, Similarly, featural memory change (T2 − T1) was related to coupled (r(47) = 0.34, P = 0.018, 95% CI = [0.060, 0.564]) but not uncoupled (r(47) = 0.11, P = 0.451, 95% CI = [−0.177, 0.380]) spindles in SWS. These correlations are not statistically different from each other (Steiger’s z = 1.21, P = 0.231). Considering coupled spindles alone, the correlations for sequence versus featural memory change are not statistically different from each other (Steiger’s z = 0.52, P = 0.61).

Discussion

Memories fade with time, yet memory change is not uniform; although most details from our experiences are forgotten, some can persist with fidelity even after years36. Sleep’s influence on the fate of our memories has been studied for at least a century (for example, ref. 7). Theories of sleep-related memory consolidation hinge on the notion that sleep’s effects on memory are active and selective, leading memories to be qualitatively transformed, not merely protected8,23,24. However, the case for preferential consolidation of certain types of episodic memory over others has been complicated in recent years by failed replications6,27 and, accordingly, the effect of sleep on memory consolidation is still actively debated10,28. Using an immersive real-world encoding event, we found that sleep selectively and actively enhanced memory for sequence associations—but not featural associations—despite sequence and featural memory probes being matched on difficulty overall. This preferential enhancement of sequence versus featural memory was eliminated when the T1–T2 interval was filled by wakefulness, indicating that it cannot be explained by other elements of our research design, such as properties of the T1 test. A lasting sleep-related advantage for sequence over featural memory was evident across two independent samples and persisted after 15 months, extending previous findings of long-lasting sleep-related mnemonic enhancement from laboratory stimuli60 to an immersive real-world event, specifically for sequential (versus static featural) associations.

In accordance with active, as opposed to passive or ‘permissive’ theories of sleep-dependent transformation or consolidation12,19, we found that the duration of SWS (but not other sleep stages) and SWS’s neural hallmarks, including slow waves and spindles, were associated with overnight memory change. Specifically, spindles coupled to slow waves uniquely predicted overnight memory enhancement, aligning with contemporary models of hippocampal–neocortical communication in sleep-related mnemonic processing17,26 (for a review, see ref. 19). This study was not designed to distinguish the directionality of the relationship between the learning episode itself and interindividual characteristics in sleep physiology. On the one hand, learning manipulations can induce changes in sleep spindles and slow oscillations (SOs)61,62,63. On the other hand, the Baycrest Tour was—by design—just one event in the participants’ day before sleep. There is evidence that interindividual differences in learning, cognition and intelligence are related to sleep physiology28,63,64,65,66, particularly spindle–SO coupling67. Whatever the direction of the relationship, our results show that the interplay of spindles and SOs supports memory transformation, particularly for sequences of everyday events.

Contrary to our hypothesis and behavioural evidence for sleep’s selective effect on sequence memory, SWS duration and neurophysiology predicted both sequence and featural memory change—although there was evidence for greater specificity to spindle–SO coupling for sequence relative to featural memory. We speculate that design differences mitigated the dissociation of sequence and featural memory as assessed at T2 in studies 2a and 2b. In study 1 there was a crossover interaction such that sequence memory improved whereas featural memory declined overnight. Although the above-baseline increase in sequence memory replicated in the study 2a sleep group, featural memory performance stayed flat from T1 to T2, after which it declined as expected across subsequent assessments. This unexpected overnight protection of featural memory may be attributable to the fact that study 2a participants slept shortly after encoding (that is, in less than 2 h versus up to 15 h in study 1), suggesting that sleep may play a passive, interference-reducing role in protecting atemporal associations in contrast to an active strengthening of sequential associations. Although the study 2a and 2b sample sizes were large relative to related studies62,68,69,70,71 and were adequately powered to replicate the effects of study 1, the better-than-expected featural memory performance at T2 meant that the interactions between retrieval type and group (study 2a; sleep versus wake) and neurophysiology (study 2b) were attenuated. Instead, effects were observed for overall memory enhancement, although the critical time × retrieval type interaction—with sequence and featural forgetting functions crossing after a night’s sleep—held in all three groups (between T1 and T2 in study 1 and study 2a sleep groups, and between T2 and T3 in study 2a wake group). Indeed, the dissociation held after 15 months, highlighting the importance of pairing serial, independent and delayed assessments in studies of human memory transformation.

The sleep-specific, above-baseline improvement in episodic sequence memory retrieval is an exception to the lawful monotonic decline of episodic memory with time3,4. This finding is also consistent with the hypothesized role of sequential and time-compressed replay, identified mainly in rodents17 and more recently in humans72, in the stabilization or enhancement of sequential organization in episodic memory19,50. Whereas real-life episodic memory encoding usually occurs through volitional action in large-scale environments, human participants in most laboratory memory studies passively encode discrete memoranda at a fixed location (for example, words or pictures on a computer screen). In rodents, active self-motion through space (versus being passively transported on a rodent-sized model train) is necessary to facilitate subsequent sleep-related replay49. Similarly, immersive real-world encoding improves human episodic memory relative to standard laboratory conditions53. Recall of such action sequences encoded in large-scale space may therefore be enhanced by hippocampal reactivation during sleep to a greater degree than recall of simultaneously encoded perceptual associations and could rely on strengthening of different heteroassociative versus autoassociative circuits73 or hippocampal–cortical networks47. Active and immersive encoding paradigms such as the one used in this study may also explain the striking long-term retention of sequential structure in previous real-world episodic memory studies46,74,75.

Given previous observations of altered spindle–SO coupling during sleep in older adults76,77,78, and disorganized neural replay in aged rodents79, dysfunctional sleep oscillatory coordination could be a candidate mechanism for the age-related loss of temporal organization in real-world memory retrieval, as previously shown with the Baycrest Tour46. Future investigations could inform compensatory techniques or therapeutic interventions, such as oscillatory entrainment80 and/or targeted memory reactivation during sleep81,82 or wakefulness (for example, ref. 83).

The differentiated long-term forgetting curves for sequence and featural memory have implications for theories of memory consolidation. First, this dissociation builds on decades-old findings that different elements of complex events are forgotten at different rates38,84, with associations across events being better remembered than associations within events35. Systems consolidation theories, however, focus on sleep’s role in extracting gist or structure shared across similar repeated episodes (thought to be cortex dependent), rather than preserving or enhancing episode-specific information (thought to be hippocampus dependent19,29). To the extent that sequential (that is, spatio-temporal) information reflects an underlying event structure, the present results advance these theories, but also highlight that such structure can be extracted even from a one-shot event, much as flexible neural replay can emerge after a single maze traversal in rodents85. The persistence of sequence memory for a single experience over a year raises the question of whether it comes to rely on cortical mechanisms or rather reflects an enduring role for the hippocampus in remote episodic memory access and sequential organization56,86. Moreover, further research is needed to assess the effects of arousal or subjective engagement with the memoranda, the timing of sleep relative to encoding (discussed above), the interaction of sleep and testing effects, the temporal evolution of memory transformation over multiple sleep–wake cycles, the retrieval cue methodology and the interactions of these factors.

Why, then, is sequence information so sticky? Memories for the sequence in which events occurred provide the building blocks for future predictions and simulations87 and enable retroactive and delayed linking of actions to their eventual outcomes, consistent with replay’s putative role in back-propagating reward value to preceding behavioural trajectories88,89,90. Sequential associations might therefore inform a latent structural representation—or cognitive map—upon which other episodic features (for example, visual, auditory, tactile features) are laid out86,91. Indeed, spatio-temporal associations between events automatically shape the way we search memory during natural, unconstrained free recall46,92, and sleep tends to benefit free recall tasks more than cued recall or recognition6. This might explain why spatio-temporal aspects of episodic memory are privileged over atemporal featural associations after sleep and in the weeks and months that follow.

By marrying a controlled but immersive real-world encoding paradigm, between-subjects manipulation of sleep, serial assessment of different components of episodic memory with matched trial-unique tests, and individual differences in overnight electrophysiology, this study deepens our understanding of how sleep transforms memory over the short and long term. The observation of an enduring overnight advantage for sequence versus featural memory for a one-shot event, linked to underlying neural mechanisms of SWS, provides critical new evidence for theories of active memory consolidation in humans.

Methods

The Baycrest Tour

Participants underwent a 20-min audio-guided walking tour of a section of Baycrest Health Sciences Centre lined with artworks (Fig. 1a; see also ref. 46). The audio guide was presented on a portable digital player with headphones, similar to those used in museums. A total of 33 tracks corresponding to sequential items along the tour route were presented in a fixed order. Each track directed participants to a given item (for example, painting, sculpture, photograph), instructed participants to examine the item (followed by 10 s of silence) and then provided information about that item (for example, the artist’s name, the medium). The audio guide then directed participants to walk to the next item, whereupon the participant initiated the next track. The guide thus controlled—in addition to featural content and sequence—the encoding duration for each item, allowing for individual differences in walking speed between items. The tour took 20.83 min on average (range of tour duration was 16–28 min across studies). The audio guide is openly accessible at our OSF repository: https://osf.io/bxm5w/.

Memory test design

All memory tests were implemented using the Qualtrics online platform. A total of 276 true/false statements were created that pertained to either item features (details) or sequences (3 statements were excluded from final analyses due to poor average accuracy (<0.4) in study 1). Featural statements refer to details such as the colour or shape of a given piece (for example, “The sculpture called One Nine North is dark red”), whereas sequence statements refer to the spatio-temporal order of pairs of items from the tour (for example, “You encountered the sculpture called One Nine North before the Spiro Family Gardens painting”). Sequence statements pertained to 25 ‘target events’ encoded during the tour in a fixed, ordinal position, as ensured by the audio guide and the unidirectional track-like layout of the tour. We use the word ‘sequence’ to encompass temporal and spatial associations, which are inherently confounded in unidirectional real-world navigation86. Sequence statements were subclassified as near, medium or far based on their encoded inter-item lag (that is, the number of intervening target items between the pair in question; 0–1, 2–3 or 4–6), which was taken into consideration when constructing the test forms. False statements were created by altering details or reversing the sequence.

The statements were distributed across 4 equivalent 69-item test forms (considering proportions of true/false, featural/sequence statements, sequence lag, reference to different target events across the tour, and item difficulty based on pilot testing at a 1-day delay in an independent sample), each containing 34 feature and 35 sequence items. The order of test forms across test sessions T1–T4 was counterbalanced using a Latin square design, creating four different form orders (ABDC, BCAD, CDBA and DACB) to which participants were randomly assigned, minimizing any order effects in group-level results. Participants were not given corrective feedback.

Given the generally high performance at the 4th time point (1 month), we created a 5th recognition memory test (T5) post hoc to probe the memory performance at an even longer delay that is typical of naturalistic autobiographical memory research but rarely probed with a controlled and verifiable laboratory assessment—approximately 15 months (study 2a). As this was not part of the original test design, 100 items (50 featural and 50 sequence) were selected from the original 4 test forms in equal proportion for the T5 test. These items were used in the exact same format as in previous tests. After exclusion, 49 featural and 42 sequence remained due to a coding error that resulted in duplicate statements (8 items) or poor average performance (1 item). The featural and sequence item groups administered at T5 were matched for difficulty (based on T1–T4 item analysis). T5 items also sampled all tour target events/placements and inter-item lag (for sequence items) equally. Although these items were repeated from earlier tests, the effects of prior testing were expected to be minimal given the passage of time (over 1 year from the previous test) and the absence of corrective feedback. That said, potential testing effects should be considered in interpreting the T5 data93. In any case, as the featural and sequence items were matched for superficial characteristics of prior presentation, any differences between the two item types could not be attributed to these factors.

Procedure

Participants were recruited via the Rotman Research Institute participant database or from online advertisements. Participants were screened for neurological or psychiatric disorders or other illnesses affecting cognition, substance abuse, medications affecting memory or sleep (for example, benzodiazepines) and prior exposure to the tour location. Study 2a and study 3 participants were asked to refrain from ingesting alcohol and other psychoactive drugs the day before and the day of the experimental session. Caffeinated beverages were permitted for all habitual coffee and tea drinkers on the day of the experiment. Study 2a participants randomized to the wake group agreed not to nap during the day.

Both studies 1 and 2 used our Baycrest Tour 2.0 encoding paradigm46 (distinct in content and location from Baycrest Tour 1.0 (refs. 36,53)). Participants entered Baycrest on the first floor such that the tour location (second floor) was avoided. Following instructions and practice with the digital audio players in a private testing room, participants were escorted via elevator to the tour start position. They completed the tour independently, with an experimenter unobtrusively following to ensure adherence to the protocol and to address potential technical issues. After the tour, participants returned to the testing room and completed a roughly 45-min battery of questionnaires and neuropsychological tests, followed by test 1, administered online (via Qualtrics.com) to ensure consistency with tests 2–4.

In study 1, we assessed feature versus sequence memory for the Baycrest Tour across four test sessions, with sleep occurring between T1 and T2. The tour and first testing session were conducted during normal business hours according to the participants’ availability, with T2 completed online 24 h later.

Study 2a participants consented to random assignment to sleep or wake groups at the time of recruitment; randomization occurred after study enrolment to reduce bias in recruitment into the sleep or wake condition should someone prefer to be enrolled in one condition over the other (Fig. 1b). The sleep group completed the Baycrest Tour at 6.30 p.m. and were then escorted via taxi to Sunnybrook Hospital (a 15-min drive from Baycrest) where the sleep laboratory is located. The sleep window was between 10.30 p.m. and 6.30 a.m., followed by T2 within 45 min of waking. Participants in the wake group completed the tour between 9.00 a.m. and 10.00 a.m. and were instructed to stay awake for the next 12 h before T2. As in study 1, study 2a participants completed T1 testing in the laboratory and T2–T4 (and T5) remotely. All memory tests were administered on Qualtrics.com.

After randomization to sleep or wake groups in study 2a was completed, we continued to recruit participants into the sleep group to increase the sample size for the assessment of brain–behaviour relationships derived from sleep electroencephalography (study 2b). This phase of the study was terminated after 10 participants were tested due to the onset of the COVID-19 pandemic, although these participants completed all subsequent tests, as was the case for the study 2a sleep group.

Tests 3 and 4 were scheduled at 1 week and 1 month post-encoding for all studies, with the ancillary T5 test occurring 15 months post-encoding (study 2a). The increasing delay across test intervals approximates the well-established power function of forgetting4. The classic negatively accelerated forgetting function3,4 produces a linear decline across the roughly logarithmic test spacing. (We note that this function was established in research that did not distinguish among elements of episodic memory, such as features and sequences.) Participants received scheduled email reminders in advance of T2–T5 testing, including instructions to complete the tests within the required time window. Tests completed at delays within 15% of the target time (relative to the total time elapsed from encoding) were accepted (see Supplementary Table 1 for testing lags and exclusions).

This research was approved by the Baycrest Ethics Board. All participants gave informed consent.

PSG

Conventional PSG data were collected from study 2a and study 2b participants using the Compumedics system in accordance with American Academy of Sleep Medicine standards, with frontal (F3 and F4), central (C3 and C4) and occipital (O1 and O2) placement following the International 10–20 system and referenced to the contralateral mastoid (A1 and A2; see Supplementary Methods, p. 4). Sleep scoring was carried out and reviewed by registered sleep technologists at the Sunnybrook Health Sciences Centre. Movement and arousal artefact identification was conducted manually by expert raters. Spindles were detected from sites where they appear maximally, at C3 and C4 derivations, during artefact-free N2 and N3 sleep using a well-established, automated and validated method94 using the EEGlab plugin95 detect_spindles2.2 written for MATLAB R2019a (MathWorks). Additional exploratory analyses of slow (11–13.5 Hz) versus fast (13.5–16 Hz) spindle counts are reported in the Supplementary Results (pp. 15–16). Half waves (a measure of slow-wave activity) were extracted from artefact-free N2 and N3 sleep from the sites where they appear maximally, at F3 and F4 derivations, using a well-established automatic period amplitude analysis (PAA) method using the EEGlab PAA plugin, written for MATLAB R2019a (MathWorks). Spindles and slow waves were visually verified after automated detection. See Supplementary Methods (pp. 4–5) for detailed PSG methods and Supplementary Table 2 for descriptive macrostructural and microstructural sleep data.

Slow wave–spindle coupling

Using the slow-wave negative peak latencies from F3 and F4 and the spindle peak latencies from C3 and C4, we performed coupling detection procedures using the approach originally developed and validated by refs. 20,67,96,97,98,99 using EEGlab-compatible software95 written for MATLAB R2019b (MathWorks). To ensure that spindles would only be coupled to slow waves detected on the same hemisphere, electrode pairings F3–C3 and F4–C4 were made. This procedure involved building a time window of 4 s (that is, ±2 s) around the negative peaks of the detected slow waves. The spindles were identified as coupled slow wave–spindle (SW–SP) complexes if the spindle peak latency fell within the identified 4 s time window of the slow-wave peak detected in the corresponding slow-wave channel (Supplementary Fig. 6a). Alternatively, spindles were marked as uncoupled if the spindle peak latency occurred outside the 4-s time window. Results of an alternative method that flexibly identifies spindles occurring during the peak (upstate) versus trough (downstate) of each individual SO event did not affect interpretation (Supplementary Results, p. 15).

The detection results for each electrode pairing were averaged for use in the final analyses. To address duplicate detection of the same slow-wave event by the different channels, we removed events in which the latency difference was lower than the minimum duration threshold for the half wave (0.125 s). Lag was measured as the distance between the slow-wave negative peak and the spindle onset. The lag average (M = 14.54 ms, s.d. = 25.31 ms) was then calculated for each individual as a measure of coupling strength.

In addition, the phase of the band-pass-filtered slow-wave signal in radians at the spindle peak latency was computed (Supplementary Fig. 6). The mean direction of the phase angles for all coupled spindle events was determined using the CircStat toolbox100. Hilbert transform was applied to extract the preferred phase of SW–SP coupling for each participant, averaging all individual events’ preferred phases. Then, we performed uniformity tests (Rayleigh test) using positive slow-wave peaks as the predefined mean direction (V-test). Individual-level analyses revealed a non-uniform distribution (P < 0.001, Rayleigh test) of the preferred phases of SW–SP in all 49 participants. The coupling of spindle events within the slow-wave cycle was maximal shortly before the upstate peak in all 49 participants (0° (slow-wave positive peak); P < 0.001, V-test), suggesting that spindles were coupled to slow waves preferentially adjacent to or before the positive slow-wave peak (Supplementary Fig. 6b). For the purposes of this article, we analysed the count of coupled and uncoupled spindles in SWS averaged across channels.

Analysis

All analyses were conducted in RStudio with R version 4.2.3 (2023-03-15) (ref. 101). For our main behavioural models, we used generalized linear mixed effects models (GLMMs) with a logit link function, using the glmer function from the lme4 package in R. This allowed us to predict accuracy (correct versus incorrect) on a trial-wise basis from both time (T1–T2 or T1–T4) and retrieval type (featural versus sequence; contrast-coded). For omnibus effects, we modelled time as a continuous variable (mean centred), based on our theoretically motivated test spacing, as described above. GLMMs included a random intercept for subject and test item. Study 2a analyses included an additional fixed effect for group (wake or sleep).

Omnibus significance tests were computed using type III Wald chi-squared tests, implemented using the car package in R. To decompose interactions into simple effects of time or retrieval type, we then reran models with time as a categorical variable and reported pairwise difference model-fitted estimated marginal means (using the emmeans package in R). We report z-scores and uncorrected P values for these tests of estimated marginal means and also report Cohen’s d (on ‘raw’ and not model-fitted comparisons for interpretability; with paired or unpaired comparisons as appropriate, where paired effect sizes were computed as Mdifference/s.d.difference). For study 2b, linear mixed effects models were used to test the effects of sleep macrostructure (N1, N2, N3 and REM, with total sleep time as a covariate of no interest) on memory (effects on memory change were modelled by interacting each PSG predictor with a time (T1, T2) factor). We subsequently probed the putative key microstructure measures in SWS, spindles and half-wave counts averaged over central and frontal electrodes, respectively, on memory performance. A separate analysis was run to test the effects of coupled versus uncoupled spindles on memory performance. For all models, time (T1, T2) and retrieval type (sequence, featural) were treated as fixed effects and participants were treated as a random effect. Partial η2 values are reported as measures of effect size for linear mixed effects models (using the effectsize package in R). Follow-up tests of the association between sleep neurophysiology parameters and memory change scores (T2 − T1 scores) were assessed with Pearson’s product-moment correlations and 95% CIs are reported using the cor.test function in R. All significance tests are two sided. See Supplementary Methods (p. 5) for additional analysis details.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.